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instructions for the orgahizitibn' and .f ormat of each section. Unfortunately, 
few studies readily lend themselves 1 to -the archiving^orocess and can be * 
easily described by following the instructions cohtainedNherein. Studies , 

' ''. ' , . " . ' , . ■ . «*■'...• . \- ■ . '•' • ;' '*■'■,''■ 

i ' - • I,. • ■'■[ '■ L " N. , ■' : . 

,as ultimately implemented ^ often deviate considerably from cheir original 

proposals and researchers -do not always provide adequate updated accounts 

- - .' : • " • i >" "... . • . , :.. ..- . _ . ... 1 
of ;the changes made.- Factors critically affecting the : datfk, such x as changes 

in sampling plans and modifications to instruments, often are hot described 



in study reports. The archivist is given the awesome task of filling in 
the gap between a general* treatment of methodology in proposals written 
months before, the commencement of the study. and a final report or findings. 
Awareness of this problem should alert the archivist to question the data 

^••CV-'V-, . j ' ; " . •• V,.. '■ .. -V • : : ... 

and hbt.fcomplacently accept the descriptions of methodologies purportedly 

'\ .. . *'■ .-»)»••• .• * „'••'"'< * " . • • " • 

! V ' *: ..«■*■■ . '* ■" v .. - .. - - - . ■ • 

-used in ;tjfee study* Recognition - of the/ failure of researchers to document x 
'the .#yn^;£c /:natur^ ; ;pf studies should also remind the archivist 'that not"-"* 
V all components of • tl^v^tandard will be availab le .for documentation , despite 



e_mosi__assiduousj. attests maide..by_ archivists. -to- uncover -them 



In addition to. c6an|j^^in;:MfferehHes In implementation, differences 
in the sources, of data impact archiving process: , some .studies involve 
direct data collection^ others reab^lyze data collected for previous - _■ 
studies. Changes in- implementation and differences in data sources both 
^suggest °a flexible approach to writing file-ievel documents. 




Archivists using this standard are urged to strive to incorporate all 

«r ■ • • " > •'. - ' • •" '• : , ; % 

components proposed in the standard that are applicable _.to the data a^d ; 
available to them and to deviate from the standard BSly when the methodological 

cta^itext of the data dictates a departure.. Finally, sq ; as not to follow 

.# _ '- - - '■-■■5 X . " - , 

k the error of some researchers who fail to 'report their departures from study 

factors the -archivist should note that the standard has BeeS amended to* 



conform to the nature of an individual study and its files. 



'•' u . '■ File-level documentation refers to the description of the donten.ts ' ; \ ".■ 

gf a single ^ta file br-a group of identically structured files. In mosfr. : • 

.cases/ is developed for each iSstru^ent : or data cbliectioa \ 

formi\~If different files were created using the same' instrument or> form, 

drily one file-level document is produced. If on£ r document refers to several x - 

fi/les, a brief section indicating this precedes thV file-level document/'. . 

/• The file-level document, consists of three sections: v . 

/ • File Background - information, pertaining to the Origin, - purpose, 
and "bolleot ion methodolbgy of the data; 

/». • Godebook descrigtions of each specific data item contained^in ^' 

the file; : ' \ . - " ' ' 

• Supplemental Information - additional information about the file, - > * 
including extended coding instructions, recodes, detailed scale 
and new variable" calculations , and copies of. original project documents. "j. '!. 



- ■ . U±4-=- r - : : ' ! - •• ■ ■». - 

The follo.wing'outline lists the Kinds of information or n components 



included in> the file background section- of the file-level documentation 

stahdar'1^ -: \. " ;?-;} / 
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r D. /Time Ihfidrmatio: 




/ ;--/';;;T... /'tJbllectibn' Time Frame: 4 When? 
: : J Z^ 'Collection Time Frame* How! Often? 

.it • vr3.4 " Data Time Frame ' r -" v V'.-V • \ 

Ev^atS Collection/ ' f^'h 



^ /Sa^plin^ and Target population 



; \. : I-.-., .a. Universe - ; v • 

7_ ■'$>"." Target population ; 

: J; c. Obtained population ' ** ^ 

2. Data Collection Method v 

3. Data Coding . . 

4. 0 Data Editing and Gleaning; ? •: " 

■ * ^ . " ■ ' 

F. Data Quality 

■ ' : '. •- - * > / ■ ■' . ' " . ' * 

Problems and Anomalies - ; ... 

. H. Access . _ . * 

1 . Location ; ."' 

2. Format. 

; 3 . Special Handling . V 

kg File Organization - . * 
5. Contact 

: >, " : : : .. ■ ■ - ' . . . 7 : > : _ .. ±-'_.^__ 

In /this section, each component of the file background is presented 
in two/ways. The first describes the component; the second provides an 
example of each component's use* Tfte descriptions arid their accompanying 



examples show the depth of information are commended,, hot the complete set" 



of possible i al^enrptives. The examples, therefore, should not be seen as 
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r'igorous models ^of^exactly hovr~thiiigs mustn>e expressed^ - r ! : — 

These descriptions focus on documentation of files containing data ^ * 

$ ; : _. _ ■ • '■ • '. 

collected for the first time, specifically for the study, being archived. - ✓ 
Many .studies go not entail data /collection, but rather utilize «data collected 
for other studies. For such "secondary analytic" studies, the 'standard 

. k ; • ' • ' . ' : ' • ^ , • «p v . , •. . _. .- 

must be 'amended somewhat. The modifications required are e described on page 



ABSTRACTS - " " 

The abstract briefly describes a data file; It discusses the purpose 
of the data in the .file and the" types of information the records in the 
file, contain. The abstract also relates information about the project in 
which the data were collected. Usually ranging between 75 to 150 words 
in length, the abstract will*" help readers determine the file ! s applicability 
toC^^ir analysis activities. 0 

EXAMPLE: ABSTRACT ; ; ' ;/ ' 1 

The Administrative Office Criminal Terminations ' * 

- file contains information on each criminal 
case' before the Federal Court System wftieh was 
closed during a given. fiscal year. T v hje data .... , ^. - . 
contain^ in the file include information 
,• on the offense,^ disposition, sentence, court, _ •' 

and judge. These data a^ generated as part of [ ' ' . " 
the, normal court reporting system. Case docket f 
sheets used to complete forms JS-2 and JS-3 _\ v • ^ : ^ '■• 
(termihat^ipnaj which; w ; ere used to create "the * 
'< .terminations records^contaijned on this file. * ^ " \ 

The data are collected continuously, and a 
complete file is generated yearly. A yearly 
file conWisfci of 33 d^ta items, and about *60, 000 . # 
' records . : ' - : • - i 
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B.^ UNIT 9F OBSERVATION : . 

' ■ .,: , . . .. \ __ /_ .... 

This , component describes the subjects or units on whom the data, were' 

bollected or* in othen words, who or what was observed in data collection 

and reported in the y data set These research units-; can .include * 

o individual : data pert^ such as a 

? ■ • ■ .. student or defendant; . . ; 



o ' state : data pertaining to a particular state; 
Lct r data pertaining, to the^Mstrict .* 



In hierarchically-organized or "mixed" d?ta files, data- on multiple 
units of observation sometimes exist within the same file. For example, 
a sinj^e data file may contain some re<^ds referring to a state arid other 
records referring to individuals within a state. While records for the 



individual and the state usually differ in both length and content, they , ft 

. ■_' : - __ . r ' _ _' . ' , ' \ • . '• / 

appear within the same file. When such a mixed file is documented, the 

unit ^f observation for each recoVd type and the relationships between/ record 



types are described. 



EXAMPLE: UNIT 0P\ OBSERVATION (1)' 



The instructional unit, that is^ a class or 
a subset of the class serves as the unit of 
observation in the Regular Program Description 
Wile.'.' * "■ 



EXAMPLE : UNIT 6F QBSER vItION (2) 



i 



The National .Crime Survey (NCS) file consists 
of three data a types, each contained in its 
pwn record type . Household records contain 
Information referring to household data, 
•"• such as number of occupants i etc. Individual 

' records contain data referring to the person 

interviewed, incident records contain information 
describing each incident the individual experienced, 
.. The file is organised w&th a household recqgd 
first and then an^individual record followed 
: , by a varying number of incident records for that 
-J ; " i^ii vidua! i '■■ ' *■ 
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SCALE. 



to scale are • •••• ; 

» ' . _ ■_ ~s 

r 9 thatLis; We afotual Aumj5er of records 



'on the file; 




• number of variables* collected /H«at is, the dumber 
of. data fields each record contains. * 

If the file is organized hierarchically (i*.e. ,. contains data of differing 

"record types), the number of records and the number 'of fields wil! be ; shown* 

by record type; This section helps a researcher determine the 'type ^hd ^ 

extent of resources? necessary to process -the file'i : >• v'^V/ 

1 . ; • . . . > " . . '. »- .-. ' 



EXAMPLE: SCALE 



The file consists of 33 data items and the' 
following number of respondents per year: 

1973-59,266 , j 

1975- 56,815 * / j ■ 

1976- y 59,512 ' - . , . .{•;. # v 



D TIMfe INFORMATION 



This component .tells when and how often the -data in a file were collected 
and what time period they describe. 



THe collection time f ramfe ; telis' whenvdata eallectibn began /'\M4,.^d^i.V<\.i£v^^ 

In the^c&se of surveys; this irifprgiation is. accurate to the" month, since , 

t t <» rr~^ ^' ■ -■- - '■ " i. - a. _• _ _ ^ _ '. * . I - "- _ ■■»'•_ r 

seasonal effects v can influence the datai If .the data wj^ce" cbllea^d atva^ 

specific time pf aSy, in an unusual time frame, or. on a: continuing basm, ■ 

; ' • ? ".■ ■ v V ' ' ; > ^"r- • ; ; ,;:; :; ;^i.s|f ;.: 

these specific Factors are reported. : v / • ■'• * ^y^ ^-^^^-j:': 



> 



2. CollectfdH JTlpe Frame: - How Of ten? 

£>ata can be Collected vbticel several times, or continually. The frequehfey ; 
of dita collection has a- dttrect bearing on the* kinds 6f analyses researchers 
cih perform on the q|ta knd the quality of the data; , example;, if data 
Were . collected several;- times ^ from the same subjects* a, secondary analyst . 
'should recognize* that eitch set 6f observations age ^cqr related; and; use ah 

ailalytic procedure which; takes this itop^tajat factor ; into account (e^g. , ^ 

' ... . ' ••-,* ' • ':•;/•••;;■ • '. : - •" *-•-• .- ...... . 

repeated. measures). Attitude observation collected on a pre/ post- test ^ 

*basis are suspect because . of "the possible contaminating -effects of the- pretest 

" ■ • *■ '• •' •. '•',*• ? ■ '••*. '■' i . *•,?■■*■■ .'-*•. 

on the pdstSfcest results / Achievement data can "also be questionable if 

the diata were collected so frequently t^ not allowed • - 

for a. gain to be apparent invstandar^iized test scores. 



f 3. f > Data Time Frame - : - : \ ■ :i v : l' : * 

■ This aspect refers tqi the tiine frame the data described arid clarifies : , 
: ^ for the Researcher the time beriod to which the data refer. This time period ; 
v.v. v doe^not- always (^re^ohd to the time when the data were collected. Frequently, 
da!ta are collected abput a. retrospective time period, for instance , a subject 
^ Say be^asked how many times^he' ^ad been attacked ia the previous year. 
" J In otihe^ ^ases, cl ^ t a coll e d v at ( one time may refer to events that pccurred - 
•yearsjVih the^past; ^s in the cas£ of the 197? National Crime Survey which _^ 
: ^Bo^afns dati;.:<bn ^crimes th^. occurred in 1 976 , : • : ^ ■■/* 

: • i|AM^^i^ . f IME^ IN^R§STlt)N ^ ; ;: 

; ? Pata;::were eoilected- pri^ a 1 continuing .basis , arid. 

drijginal dipa-we^ thie Administrative '••••^ ■■ :,, :V" 

: * ■ Qf^Qfee^htrnQhtHi:- The .^t^pecdr d ,fi;s ; ^gener ateH v . . ;• . X^.; : i:^? •; ^ 
•:- •; at; the ici osi^bf iT'cri^Sa^^pa^i i^A^f ip^^ariy- P ^ . .-^ 

. tape is preated at fh0^ each f iscai ; L ; ^ i 

• / ^^j; it ppijtaihs the ^records;- for eas$s : closed;? _ , ••''-'.'j';--^- ; '•' ..> r V- /: A- 



i 1 - ■ - 



in the previous ^ear> \Hpwever^t^ offenses 
repnesented_may have occurred at any time in the 
-P3g> 20 ye^rs . : ' \ . 





J- 
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- , E. .DATA 


COLLECTION,, 




This 


component exp 


>lains> the process used to collect the data 


contained 


in -the file; 


Aspects of -'th£\ collection process covered in 



this section are / • ■'.-■\ '- 4 i*.. \ ■ ■■ ? ' . 
. sample, target, and .c^a^ec^ populations;; ; ; , - y j 

- #. \ data collection method ;' v , .' : ' - ; 

•coding 
• ' • editing and 



I Sample, Target, and Obtained Sopu ^ ' / ; ; 

• The universe, target, ianfr obtained population of the data set are 
; described :^he^y.--/. TfiS overall intent .of : the sampling component Is > to allow 
a vrese^che^- to understand (1 j ^the nature, of the population from which the 
.■ ; ; sample was ,;dr awn in fcrder to 4 determine to "whom findings can be generalized 

(upi verse); (2); what population this particular data ^t d[g$cribes (target); V 
. • > and- (3) whether the ; sample is adequate to support her/his research goals . 
(obtained) . For .'instance , after" reading the ' information in this section, 
a researcher interested in. a particular data set may find the sample too ' ■ 
. ; limited for. her/his .purposes, v ^ i 

-:.;^>' A : £h a complex weighted survey , this component's description can be quite ' 
: f i;lx^ ini| ••i^quii^. deta within -each^ i.t&sectibfi> r vSOTe; ; '--siudies.. .;: 

. j : target " entire^ urii^rerse; pf * respondents; ' these sb-palled non-sampled data ■ • 
. sets .stiir requine a samfae Rescript i ; : / ^ ; : V ] ■ ■ 
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. ■. • ; • • - • : : Jl ! -—^ ! 2 ■. : '. ! : □ : Z / - *y* •■ . ' • ii i_L _ ± 1 : 

Specifically , the explanation of the data universe describes the iiiember- 
^hip and size of -the universe* and the reasons for sel36tihi|1She universe 1 ' ; - • 
for analysis;, Universes could be . 'i : 

: , ; i: . ; all'; 1 5-year-old males . in .the -State, of New Yqrfc; v . A- "H< 
V . • ; working, women J v ■ 

• criminal cases closed in fiscal year 1974; , 
• .# people- convicted' of murder in California in 1977. 



: v ■ 



•b...'; Target Populations : - ' . r - '. ., /.v-:': r 

V "For sampled data files , target populations are discussed. The discussion. 

-'provides : information, on the . intended targets population and completely: describes " 

- th^ sampling plan, ' ihcluding sampling goals , sampling strategy,., and target : - 

sample characteristics. The description of sampling goals details the factors 

contributing to the decision to use. some type of sample . These factors 

include - . ' ./ • - - |j 

."• economic - "The sample was limited to 500 subjects, since the budget 
did not permit the study to collect more data:." * 

h9 practical - "The study J bhbse subjects from the Boston area for . : 
*. V . follow-up , since the researcher f s offices are in Boston." -> s \, '7*V- 

^statistical - ":The. : study design required the sample to be repre- 
sentative in terms of rade, age," and sex.'- ■<■■: ... .; \; . m 4 

V^^"---^^®^-^!- this section includes . the size of the sample , its characteristics, 
and the reasons for its choice. .• v ; ■ *;■.■ V. 

v; The . sampling; descriptioh includes the • type of Sample used , ; ^ 

the; specific criteria for its sel and the methodology for drSwing .' 



it. Ty pes of samples include* 

. probability sampling v . * 

■ ■ ■ V * ' ,; _ ■_ " . si-:,. •• " ' 

proportionate stratified sample .J. 

choice .of stratified factors: i.e. j race, income* /district 

disproportibrlate stratified sample : v 

choice of stratified factors • }' 

cluster sainpling . ' . 

multi-stage sampling- or multi-phase sampling (double sampling) 

samples with varying probabilities t . .-. % • : - ' 

area 



quota sampling : '• K 

purposive sampling 
The selected sample is described, detailing, Whin applicable, any 
potential bias which may occur in future analysis due to sampling techniques 
and special features of the target population*. ■ 



c. Obtained Population 1 . 

The description of : the characteristics and size of the collected data 
file contains information on the research units in the final data file 
which, in some cases, differ substantially from the. intended , target 
population . Substantial differences between, the ' target and obtained populations 
are described in this section. These differences arise from problems 
encountered during data collection which eliminated portions of the target 
population from the file or caused a change, in the sampling strategy. 

This section also describes the magnitude. of-, and reasons for, nonresponse 
If the nonresponse. rate were high enough to require that the data be adjusted 



•not an exhaustive list 
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, prf br to afialy^i s ~ 9 : .' the adjustments . are also described, in. addition, standard / 
errors 1 of estimate" ^ for certain characteristics are presented ■ J ' : . 

heri "6? in in appendix, .or ; referenced if they were published separately. 

EXAMPLE: DATA COLLECTION (l^J^^j.">> :J f ! v . , , . ^ . - 

: - ; Universe: all criminal €er^ ; . :. iw-w:' 

^ i years 1973 r 1975, :J»d, J|76*flk ' g; . - - . ' • 

Target population: a 1$ s&mple of #he." records drawn. _ - 

, through a prbpbrtiohate stratified/sample, stratified i - ■' ' ,^/..- ;; ' ■;■ ; . 

. by district and by year. The sample was chosen to ' obtain : ." 
approximately 2,000 records. ISampie ceils were defined * 
fqr each district for each year |n^the universe^ Cell * v 
size was calculated $Ln proportion to -the number of cases - • 
iii each district. The actual sample was drawn by randomly 
vselectihg the proper numler of, records from within each ■ 
cell. This selection was done| by dividing the number * ; ; 
' ■; \ of records in the cell by the number of records to be 

selected in that cell. A random number «• was then generated - 
hetwe'en this quotient ("N" ) anjd that many records, were 
skipped at the beginning of the cell. Each n nth n record 
within the; cell was then chosen. Since the. order of records 
:■ within each cell was sequentially assigned by docket number , 
cell contents were considered randomly ordered* 

■ \ ' : Obtained Population : 1,121 cases were obtained from : 
a, sample' of 1,600. Nonresponse was limited .to 12 specific 
-districts and complete cells were obtained^for all other 
districts. No analysis has been done to determine the 

impact of nonresponse . 4 " ^ . 



EXAMPLE: DATA COLLECTION (2) > i 



Universe: all 2,000 'students in James Monroe High 

^ChOOl. " 1 ■ - 

garget Population: Same as universe. 

Obtained Population: 1 ,754 students were interviewed 

Nonresponse rate was 12.3?. Of the nonrespondents, . 
: ;1i71. were unsuitable for interview, 6.2$ refused,, 2.4$ ^ 
were^away/frbm home, 1.6? were out at time of interview, * 
and were not interviewed for other reasons. 
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2. -; Data Sollectibn Method 

' * ■■ * • . ■ ■ 

The indent of this cpmp'onent is to enable a. researcher to review data 
collection forms arid methocte and to understand how they were used in a 
project. Information is also provided on fbiiow-up techniques and bthir 

pr^bc^ureS; for improving responge rates.. : .u-\&*'hr* w:- . : . ; V ■ 

* V siirvs^^^ta are . usjiiiSll^ ^y$(£r% 
collection instrumjent or questionnaire, -while, for nonsurveys data are* ^ 
collected using a form or the output of an administrati;ve ^systfem. ; Suftyey • ; ^ 
data collection, meihods include* •' ""■ 'v • 
self -administered questionnaire; 

mailed questionnaire; * V ■ * . ; 

oral Interview (face-to-face or by ^telephone); 
observation; 
administrative output. 
Administrative data , also known as process-produced data, are generated 
through the normal operation of an administrative system. In Qther words, 
these data are not collected specifically for research purposes, but as 
part of an ongoing management function. Examples are a personnel file 
. containing information about individuals* salaries and a hospital information 
system containing accoQnting^nf braatibn . about each patient ; 

This component describes the method of data collection as well as the 
xlata collection form* A copy of the form is placed 'in an appendix. .A 
description of special instructions for the project's data collection 
staff as well as copies of documents containing unusual interviewer f s 
instructions are also included in an appendix » : ■ / : 



* not an exhaustive list - : * : * 



EXAMPLE: DATA COLLECTION METHOD 



The Classroom Roster was a form completed by all classroom 
teachers in grades 3 and 4 in sample schools, The Roster . 
^provided an unduplicated count of students participating . 
in compensatory education programs. Each teacher listed . 
-all; of the children in terms of sex j ethnicity ^ reading 
achievement level, free lunch program participation rand 
participation in compensatory programs." . /: ^ :. 



Mjariual and machine coding techniques applied to the data are 
here. If* special procedures or handling were required , ttwy are also explained. 

Manual data coding » is often performed prior to data input. JSudh cbding> ' 
commonly occurs when a study employs questionnaires containing open-ended 
questions, or when instruments include questions that ask the respondent ^ 
to choose among several alternatives (e.g., the offense category in a- criminal 
'tape). Machine data coding techniques are those techniques automatically 



applied during the data entry process, such as left-zero fill, changing 
blanks to etcr"~ ^ ^ 



EXAMPLE: DATA CODING 



Each questionnaire was manually reYiewed for bp^h-e'hd^ / 
questions jand for irjter viewer notations ^indroating problematic 
questions or .responses (e.g., a person gave more than- one 
response to a question calling for, a single response). Each 
problematic question or response was reviewed; the most 
reasonable was chosen or the field was coded as "missing." 
Open-ended questions, Q3 and Q5, were manually coded 
according to a coding scheme Which appears, in the appendix. 



4 . Data Editing and Cleaning 

; Data editing and cleaning consistency checks performed on data include 
syntactic checks and Semantic checks . y 
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: Syntactic - checks deal with; the form and characteristics . of an individual 
data field and insure I'that^each field r c^^rms : to ihdividiially defined 
characteristics % A SEX field- might be specified as alphabetical, with/ 
acceptable values of "M," "F," or "blank."' An AGE field might be defined 
as^numerical, with values ranging from "1" through ^99." A nominal variable 
like RAGE could be defined as numeric, with values of '-I;* "2," "4," "7." 

Semantic - checks < investigate relationships between two or more vari&bles 

-■ : -$:^;;r : v. « ; : . \. ■;_ ■ ": ^ . ■• ; • * . .. A /*: ,'. 

and insure that data wL thin a record is consistent and "reasonable. In a Vv 

'survey of "elemehtary school students, a student with a GRADE of 1 *must have \ 

"ah AGE between Jj and 8. In another survey j a respondent's AGE cannot be \ 

less talari his or her child's AGE. These semantic specifications could become 

- * ■ • ... ■ ■ ■ ■ 

. qpite extensive and complex since all possible relationships between variables 

may be considered. For example, in an international economic data set^ 

- -- \ "- ■ ----------- - - . 

a nation's GNP can be related to its population, t industrialization level, 

and geographic location through a series of complex models. ^In a criminal 

record system, a 4 sentence can be related tb the type of crime and the defendant 

past record. A 

The checks performed on the data are described. If a complete set 

bf cieaSlng specifications were developed for the data, these may appear 

in an appendix to the documentation. The key part of this section is to 

»_ ';. ' - • ■ " -g6f ' ' j " _r 

highlight any broad problems uncovered and to detai^ any corrective actions k 

whjich may have been taken. " 

■ * *■ • - ; '/.•."••■ ■* * ', 

EXAMPLE: ' DATft EDITINCS^MD CLEANING ■ ' 

s An in-dep&i data cleaning analysis was performed .- upon 

the data for the yearjs 1973, 1974 , and 1976. This analysis - 
consisted bfsrange md ^lue c^ecki3 for each data it^m 
and a number consistency checks ~. The findings are 
available in a report entitled Data Cleaning of the Criminal * 



Term i n a tion s Data ^Capes, dited April, 1976. The major 
conclusions of the report indicate that the data werey ; J' rr . • 
; ' for the most part, within _ expected Ranges, although significant 
amounts of data were missing .from the SEX • and RASE data * s 
; .fields. . . • - 

F. DATA, QUALITY \ < • ; . . 

i *• •' * ' - : .. .... :* .. v . ;• - . 

This section's goal is to help a researcher determine the overall accuracy 
and quality of the data set. The value of an otherwise interesting datai " .vV 
V set can be severely limited. if the , original data collector (or archivist, /' 
in some cases) failed to conduct quality analysis. 

. * The validation and reliability analyses performed on the datk and th#: 
key findings of these analyses are described here. There are three types* 
of quality analyses: reliability, validity, and correctness. ) 
" Reliability analyses are designed to insure that the responses, to particular 
questions or items in the data collection instrument are replicable* In 
other words, reliability analyses determine if the same question or iteffi / 
will produce the same results consistently over time. Usually performed - • 
as the data collection instrument is pretested, these tests are fully described ;,. 
in this component or referenced, if .they .weVe described in 'a published report. 

Validity tests determine if a question or item accurately measures 
and reports the information a researcher attempted to analyze. For example, ' 
jdoes a question about home value properly 1 represent a respondent 1 s affluence? 

Correctness checks determine if the data on the collection forms is 
actually correct. 'Types of correctness checks include the^recollection * .* ■* * . 
of data from a sample of^the obtained population and independent ^verification 0 
of the data from outside sources. •<.' ' * 

; If ho validation oi^ reliability analysis were performed, the. reasons \ 
for assuming reliability must be presented. This requirement applies equally 
to newly collected or administrative data. ; • ') ~ : . 
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EXAMPLE: DATA DUALITY 

: \« • - ; " \v ._ ■, ■• .;. j; ■ _ •• 
v : The criminal temi^ 

- J : validity analysis ^ sample of records recollected 
v f rpm: the original data source (docket sheets). A complete 

>'Y \repor£ on the findings of-the aniiysi? can be found in ^ ; 

Reliab il i ty of the Crimin a l germ i n a t i on s Data Tape s , dated 
-: 6 . January, 1979 . The report- indicated that considerable .■> 
• error exists within the ■ offense f , di spositi on , sentence , • . ; 
■■ term of -imprisontoent , and term of "probation' fields./' Further,.; 
; the report 'states that ttese errors -ar^ substantial -and* 
■ / - * in some cases , that qyer 20% of the , data records are in 
. ; error. The report's final conclusions are that the data 1 s 
y /' : r.: reliability is low afid, therefor!., its^use limited.., 

. ■ ■ .a :> . 4;.. • v. A -:, >. . v * y-. 

- G P . v PROBLEMS ; AND ANOMALIES ^ .. 'C' -v . \J:' :K 

' ^.^' r ' .prpbieias and anomalies, 'and strategies for dealing with, them, are 
hot* clearly ^ixpl^n§d|';^^ 'can' 'lead - to improper ,data analysis. Problems 
and anomalies are usually uncovered as a result of threes processes: i) - 

> data .^collection , 2} data editing 'and cleaning, and r 3 ) data analysis . Often 
these problem^ are hot formally documented.. In such cases, this part of 
the documentation is based oh interviews with persons who have worked, with 

the data ip the pas-t . ' . - ; . - 

•• Y : •': i\ ••' ; .V : " „ .'; • . \ •" '■• ; ' *•;•*. 

ProBlems. and 'anomalies arising from the data collection effort might v 
* include incompleteness' *bf the sample , * anomalies in". the , instrument or its 



; ihstrudtions and adjustments made to the data after their original, collection. 
Since ambiguous questions or instructions may have been identified 
during ^data collection, changes i^de" during this process should 'be- documented. 
; ' . ^3p r oblems and anomalies discovered through previous ; .analysis activities, 



an explanatiori of their source, and suggestions for treating them are also 

included;- ■ .': \ . ' 

^EXAMPLE:; PROBLEMS AND' ANOMALIES :, ' ' ' \ : 

A number of 'serious' problems exist Vithin this data 
: ^ ; set.' A cleaning analysis uncovered anomalies in. many 



:;. , •:. • ■ ,» < : - • • -.r.Vi •• ' ., / 18 

datk Tieldsl £he_ class. ID. cdritairis : spurious numbers in. 
Its first two digits (which should have indicated^ school* 
'biiildihg), Ijri. a -large number of. cases, sex and rabe 
. fields &re blank; In other fields, a small number of. 
/cases contained spurious response codes. 
■ ' - - \ ' ; ' • • V; . . A ... ' \. ,' ,. '"■ ' •' ■ ? %y:. 

Ihfpraatibri related to access to the data sit is included iri this 3eetipn . 
. This infomatibn . concerns "location, f prmat £ and special 'fcariiift^ 

". Locatibii *"<;"• ; •^■•v';.; : .::- ; -'p '^'v 

;.• The present location of the data set is specified, and instructions 
are given for obtaining additional copies of 'it' from the original or current 
source. If the data set is available locally, the tape or "disk number, J 

" file name, arid sequence number is included. ^ 

2. Format . '> \ 
Information about the actual technical, format of the tape allows programmers 

at \a removed location to read the tape properly. The file-transfer standard 
document Retails the^acdeptable formats for data file transfer. 

. • i_ j *j • -» *. 

3. Special Handling •, / * ' 

Special restrictions on, or handling considerations for, the data set ? 
are included here. In some studies, the release of certain data fields 
may be restricted. This may be because, of pledges of confidentiality given . 
during the original data collection process or subsequent policy decisions; 

Other limitations might be that ohly aggregated data may be released, 
as in the case of the U.S. Census data. Here, individual-level data are 
not released, but special aggregations of the data are performed in response 
to Researchers* requests. 
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U. File Organization ^/v^ ; -V''-, '*] ''. 

* . _______ __ ______ ____ __ j - 

The manner, in which the file is sorted or ordered is described here; 

5^: 1 Contact , ; , - 

the name f telephone number ^ and address of the person or bx^anizatibh 
responsible fpr the data are listed here;,. r / *. : > ' 

The criminal terminati ons tape for years 1972 to 197^ 

is located 1 on the reel 021477. V _'!„' ' • 
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it is recorded in 9^track ASCII and in 1600 BPI, has 

a record length of 80, and is unlabeled. ' , ~~ " ^ 

It may hot be released to the publics without special 
authori2ation_frbm xxxxxxx. It is sorted by district 
. arid maintained by John Doe, 202-555-634V NCES, 1520 H r 
Street, Washington , D.C. . 

'"i ' * - ; ' 

I. MODIFICATIONS FOR SECONDARY ANALYTIC STUDIES . 

The preceding; instructions for preparing file-level documentation have 

. .. . ; ^ ...,._::_. j_ • 1 - ■ 

been developed for fi^es containing data collected specifically for the 



study being archived. However, marly studies do not collect data, but 



rather utilize data drawn from other studies and sources. To .document these 



'Secondary analytic studies , it is necessary to use a slightly amended form 
of the standard. Guidelines for writing -components A-C arid E3-H heed not 
be. change d^ but," components D-E2 should be modified as follows: 
D; Time Information 

1. Time Frame of Original Data Collection: When? 
, 2. Time Frame of Original Data Collection: How Often? % ~* 
3. Original Data Time Frame * 

E. Data Collection and Modification Information 

* . ■ 1 '_.__*_ _.._ "'. __ 

: , 1 . -Description of .Original Data, Sources 

■ ... • • . '. 21 " . • 



'---V ■".•/• : *- ! " "r ; ' ' ' 20 

2. Description of Present Sample : .. > : ' 

: J* 3v Merging/Re format t ing . Per forme d in , Present Stiidy . V ; j.v'. : :.'V; • 

Items D1~3 are obvious changes ^ihce^ by definition^ no data set is ' . 

generated in a secondary analytic study. ; : 

Items ET-3 differ considerably in this version of the standard from 
those in the standard for files from primary analytic studies. The original 
•source (s3 of ~data. vdescribed ^br^^^. inciudiSg size; of sample i type of . >\ V; 
3atai ;?chiara6t§?istics. of those sampled. jf ar^ : : 4htereste^^ ; :-.:.;' ;; 

, in descriptions of the sampling strategy used by the original: data collectors, 

this information and the universe dtescription can be obtained from the Final 
• Study Report' listed in the bibliography accompanying the description of ° : 
the substudy in which the data were obtained. 

' f ' 0f greater interest to archive users is the sample used in the secondary 
analysis; this sample may either be the entire original sample or a subset 
of the original. If it is ; the latter, the sampling strategy employed in the 
secondary analysis should be included in item E2. 

;» In a secondary analytic study, there is no instrument to be described 
j3ince there was no data .collection .activity. However, in such a study, 
the parallel activity is modification of the original data, which is accomplished 
♦by merging, and/pr* reformatting the previously existing data files. 
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II. , CODEBOOK 



•?AA:r A*V« 



A The codebboR contains complete information ab^ 
contained in the file; A sepa ; raig; eb^bbbk' £s pb^a^^t^^^^ ribQ^^ 
type wi thiii hiera^c&cal ; ^ types.;. i 

The- 5p(?bif ic component? bf ^;c6de^oo^ are'^ " w AA'- ^ ? ^ "'-.r A'^.^-. ^. 

vari.abi^' ; napi} v ' '-M\A .^^S-.. s . AAA:-/ *'A~A- ~'AA'' * A : >> A- A. ','/-:A. 



• preference number f •• 
. « * A variable label; 

Ipcatiori specifier; 

• file identifier; 

• " missing values; 

Av<- • ' a- -a? n A :a • - • 

V, y question text;** . 



; response codes; A 

A ' _ * / ; : ■ . £ ':. 
response labels; * 



•iptions; 



notes. 



All of these components are -incorporated intQ a fprnmt. illustrate'd « 

" A \ S I • /' ". • ■■ fv ; • AAA "A- A : . AA', 

in Figure 2/ ,The piarpose' of this format is consistefncy of presentation. 

. .. ••• — __ ' AA _ __ „ '. '•• ■; i^v. V .-"A .' AA 

; 'The instructions: for creating codebook components contain restrictions 1 , 

bh labeling 1 arid variable length/ These restrictions are based on the 

..capabilities an* requirements, of SPSS. We ; chose SPSS because it is by- • ; 



far the most widelyi-used statistical analysis, package; compliance' With .its 



labeling conventions 
produced. 



the greatest potential use. of the codebooks 
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' : i variable name is assigned to each data ^ used >" .-. 

by, SPSS .-or* other ^statistical processing systems to identify the-'dlta items , 
selected for analysis. 'It is ;a. short name ;for : S data field and 'qbhsilt^ : • ' 
of hot more than eight characters, the; 1 . first ^f ; whi dh must, be*, a. le£ t,er . r ■% Vr 

.'? A : : • Variable name.s \may *be Either des^lp^^ii verbal) or numerical -Ex^pies. : 
*6f descriptive variable labels -arer n sex r r "age,?" "grade, 11 5ften^ 

the eight^Qhkracter limi.tati'bji demands that the varyiablie 6ame be 

• . - •• :>•*-..■. -v? • :.'i^<..:-" ,, < -v> ■ 

• The wia^ie^name b you h^%e?%^fggt^i^ ■ ^^•'^•■.i ■: 



- 



researcjiers a clye-tp the .^oijtent jof ' tl^ v< gue^Mon; ;v " ^ - ' - 
. ; ;v V r Nunfef^b^l ySriable naSi^^caff :fte :creat^d^as : *e^rences to qu^stiSns^ 
*- on ^a^ survey irfstnim|^t V> ttf items on ^adinini Strati ve f orm "o'i: ;to variables 
v '<ifca ""study. When. Rising SPSS, the "f^rs^ character: in •nuffleijiqal, 'variable 

■ V/..-L-'.- • • ^ • ; '•• ^'v J f .'..5 V:::* . : _ .Jr';- . ' ' • 

{ names ' must be 4 let ten, f or^- iii^tance , Q7 ;(cfuestioh Reference): V10T (variable ^ 
^e^e^nce). . ■i^'::: • .y.;-; * 4 '7 ;: ^;;^ : ^'- • .' ; [ k 7:^..' 

. Tfierf syre /t^5 advantages ; in^ numeric variable rtames. First, these 

■ - 7 J /'-'"- "i ■•- * - 1 . ■■ . 't> ■ . •<? • «•',,':•"' r -- ■ , r •: v 

--.^ ' . , i-'i i .* . « " ' '■' - - - - - - i. ! - '. ■ ^ ' 

lumbers provide' the re*se3rcjaer. with a reference that^ties -the codebtfok to 

/ •the instrument * v- S^cbndly r the person preparing the codebook bah devise 

V^y^ijL *&^ ■ r/ - V _ 3 . _-.lL: . ..^1 ' 

.Subh vStfi'abli v «naffle with 4 ;the^Surance \]bhate;no .-variable names 

v are used more .than oribe> ' In multipart* dUestl^ where 20* dfata items form 
Qiv iliis shaking system can^ be'^a little >c5^ Usually, this confusion 

can be resolved by us^pg; the ^ desi^^tij^ |^ ^Q1A , Slif" ■'■Stfe. 

-In codebook? of ' ; #tudies Uijvol vi^ ..a v very ISKge " number of data items,' 
variable rtames numbers preceded 

o( by iet^er tt V w : eigi ^ w Vi ^^1%^^93j . Using- these n V w numbers is 



sometimes simpler than trying to remember whit the abbreviatl bh ^NHRKIBS" 



Zti Referenc 



A reference number is assigned sequentially to each "data field in ^the 
I •■file j beginning with the number "I." A researcher uses reference numbers 
v to indicate those items she/life needs for her/his analysis of the data. J ■ 
< When a researcher uses the codebook to select a subset of variables from 
the original data f ile , these* reference . numbers remain the' same. Therefore, 
the data item* with the reference number "5" in the complete , file" is reference 
number "5." in a subsetted file, even though it may actually be the first 
v^rdable on the subsetted file- 
Reference numbers are also useful for cross-referencing within the' ' 
• codebook* ; A note on one data item may refer toia previously-defined item. 
For example, ah item which sought to determine the most serious behavior:, 
problem teachers encountered th i s schoo l "ye a r might. carr#. the note , "Reference 
#306 concerned the most serious behavior problem teachers experienced last 
year ." . - • 

3. Variable Labe l ... J: -y?;.. .; . 

The variable label summarizes the" content of an item arid identifies 
it. more completely ttian the variable name. . When a verbally descriptive " 
variable' name is used to identify an item, the variable label is an expansion 
of that label. Wheri a numerical designation is used to identify an item, 
the variable^ label is the researcher's introduction to the content of the 
item. Variable labels are subject to a few constraints: they may not \e 
more than 40 characters in length and may not contain the characters V," ; •■ 
"(," or ")". . - ..' * .' 
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• Very often, the variable; labels' are listed alphabetically at the end 
of the codebook in an appendix called the "variable label dictionary .." «r. . 
The purpose of this dictionary is to allow researchers to scan, the labels . 
and identify items of interest. For this reason, the variable labels should 
be "composed so that the most descriptive - word appears first. In this way, 
a number of related . items would alsd be ;^ouped together in the dictionary. 
For instance , the following labels might be assigned to- -a^©rx>up, of ; items. , . i) . 
pertaining to demographic information. 

DEMOGRAPHICS: RESPONDENT'S AGE . , 

DEMOGRAPHICS: .RESPONDENT ' S« EDUCATION ' ; i . .y 

; ; .DEMOGRAPHICS: RESPONDENT'S INCOME N 
DEMOGRAPHICS: RESPONDENT'S OCCUPATION 
. DEMOGRAPHICS: RESPONDENT'S RACE ■ .": - < :'.-y : :-^ . 

- . DEMOGRAPHICS: RESPONDENT 'S RELIGION v 

DEMOGRAPHICS: RESPONDENT'S SEX ; . ' ; : 

Choosing an appropriate label involves a great /deal of guesswork, since; 

the archived must try tb determine what topics will be most interesting 

to most researchers. A file can be analyzed in so many^ ways that it is 

: not -always possible to create ^variable labels that tell every researcher 

What s/he wants to know. .= r ;> 

>".' When a variable forms a part of a series (e.g. , a question asks which 

of a series of events happened and directs the respondent to circle all* 

that apply), each variable label can ^ontaina^pecific description of the - 

event and a very, brief reference indicating that it is part of S. series. 

For example, a series may ask the respondent to "Rank tfiese f&tt5fft'*^'^?der-" 

of their significance in school crimes today"; each factor is^ coded as a 

Separate item. Each variable label may be coded as "CRIME SCHOOL : ^factor}," 

fpr example: ° ' ;y 

; CRIME: SCHOOL: POVERTY ' ' ; f - \-r 

eRIME^SCHOOL: LACK OF DISCIPLINE . ■' :■■ . \ 

; CRIME SCHOOL: RACIAL TENSIONS 

CRIfffi SCHOOL: BROKEN_FAMILr _ \ , ■ " * \ *..;;/ f . 

CRIME SCHOOL:, URBAN ENVIRONMENT 

erjc * , , . ' 
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In spite of such flexibility^ it may still be impossible to create 
a variable label within the length constraints. "\ : : 
^ ^Iirsbme caseS^ the length constraint pbsis problems in bompbsing meaningful^ 
' descriptive labels and- requires that the actual question text be\ abbreviated; 



The abbreviations used attempt to convey the essence of the question; extremely ■: 

cryptic descriptions are avoidecU v When creating these designations , words - ~~ ; 

should be abbreviated from right to left, dmitting suffixes , connectives , ; 

articles, etc., in an attempt to majL^tain comprehensibility. 

In cases similar to the example above (i.e., a series of related items 

investigating 'a common factor) , 'an explanation of the meaning of the abbreviation 

could be given in an appendii-Jto the codebobk. Here are some examples of 

abbreviated variable labels used for a series of itenis that were part of a 

question, "As far as you know, which of the measures on this^card were used 

to determine the eligibility of public schools in this district for this 

year's Title I program?" , 

ELIGIBILITY TI, P: CENSUS DATA , o 
CTI = title I; P = public school) 
. ELIGIBILITY, TI, P: AFDC ENROLLMENT 

tIGIBILITY, TI, P: FREE BREAKFAST 
IGIBILITY, TI, P: FREE LUNCH , 
ELIGIBILITY, TI, P: 9 NON-ENG SPKG 

As this example demonstrates, the use of adequately defined abbreviations 
can convey a' great deal of information within the iiO-character limits 

4. : : ' File identifier ... 

The file identifier is an abbreviated reference code used" to describe 
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a particular data file. It contains not more than eight characters which 
represent a substudy and a particular file from that substudy. For example, 
in the study of compensatory education, there were six substudies, one of 
which' was the demonstration substudy. The data^frbm the demonstration ^ub- 
study were contained" in 11 files. The letter W C W was used to designate 
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the flemShstratibh substudy'^ihd each if lie was simp^ 

1 to 11; -Thusi the file? identifiers for the deiMnstratioh substudy ' ' 

were; CI ^C2, C3, ... CI 1 . \ ' - . - ; ' « 

The .file identifiers u^ed in ^h^s Education Voucher Demonstration Archive, 
were more descriptive*. The siv ^n^mity gurveya were identified as CStlffV ' 
plus this season and year of their administration, i.e., CSUR VS7 3 =C ommuni ty. 
Survey, Spring 1973. ; * 



The location specifier describes the physical field location of the 
data item ; within each record, * Location is usually defined as "card ^/starting 
column - . ending column . " If a file consists of only orief record per case, 
the specifier includes only the column number, as 'follows: "starting column- 
ending column." For instance, 'a data item located in columns 10 through •/... 
14 of card 5 would have a location specifier of 5/10-14. If a data item 
contains an implied decimal point, the number of places to the right of 
v the decimal is noted in parentheses after the location specifier. For example, 
the wage data item is located in columns 10^ through 17 and contains two 
decimal places. Its location specifier would therefore be "10-17(2) ." 
If the data item contains alphabetic information, as in the case of a state 
variable, the location field is followed by the character "(A)," indicating 
an alphabetic fi£ld. For instance, the state field is located in coluSis 
56 and 57 and contains two-charabter: postal service abbreviations. Its 
location specifier would be tf 56-5?(A)." 

6. Missing Values 4 j 

The missing values field contains information describing the missing 
data codes in the file . This field, can contain data in one of three forms: 
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; • cqd^i code* code. ». a list of : the individual codes wteich signify , Y-v : 

^ V •.. a missing value; * 

./■''• -code-code a range of missing value codes (-rthrough); . ^~ 

■ ,• / ■-. - ' '. . ;, .. / ' . . " - ■• ■ ... , . , '-"^c; >^ 

. »' code or ^Fode *■ all values J>ess than, or greater than the specified 

- ^ r\ code signify missing data. r 

Hissing -value;, cases arid their; meanings should also be included in the - 

value codes section. 



7* Question Text ^ 

The actual question text as it appears on the datia collection instrument 
is reproduced here. If an instrument was not used to collect the data, 
the contents ;j of the data field and its derivation is completely described. 
Fdry ir^stance, the "question text" of an item on- an employment ap^plication 
might r§gid, "Line 13. The applicant wrote his/her current employer's address 
on this line . " The text for a survey- type instrument includes the question 
number (i.e..,: Q1A, Q3B) , unless the question number was used as the' variable 
name, as described in section II. 1. 

. . If a question relies on the preceding, question for its full meaning, . .. 
the question is clarified. Toi^sist the researcher in knowing ^xactly * ♦ 
how the question was asked,'; the clarifying text is placed in brackets or 
parenthesis. For example , >a series of questions asking about educational. 



attainment has a two-part section; the first part is "Do you have an additional 
degree?" (answered "yes" . or "no"); if the respondent .answers "yes," s/he 
then answers the second part, "What is it?" The question text 'for "What 
is it?" would become: "(If you have an additional degree) What is it?" 

interviewer's or respondent's instructions* printed as part of the question 
are usually deleted .when* the question text is recorded. "Here is a list 
of factors some people say affect crime rates (HAND CARD B). Please choose 
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the three factors you think have the most influence bn crime rates • " The 
^phra^e "(HAND CARD Ej" would be deleted - This information is convey eti in 
gthe notes: "The respondent, was given a card listing factors to help him/her ;/ 

answer this question." 

r . . * ■ . . • . . • ' . • .• - ' - • — 

8. Value Codes A . 

A value code is keyed for each response to each question, rae codes' 

have very little meaning unless they are presented with the value label (9) - % 

and the value description (,10) described below. For example, a sex question 

may have a value code of "1" for female Sid "2" for male. If the variable' 

has a very large number of codes, such as a district or offense field or 

responses to an open-ended question, to save room,, a list of codes and their t 

ijea^ings are placed in ah appendix rather than the main codebook. The notes 

component contains an indication that this: has been done: "See the appendix 

for a list of value codes and their meanings." ' ; \ 

9» Value Labels ■ / 

Value labels concisely describe the meaning of each value code. These 
labels cannot exceed 20 characters and cannot contain the characters "/," 

n (, " or ")." In many cases, abbreviations are necessary. As in the case of 

- # . - ■ ■ ■ . 

variable labels^ abbreviating proceeds from right to conveying as 

much information as possible , so that the meanings of the labels are at ;,- 
least evident through context. The value label is similar to the variable 
label in that it is often a summary. Just as the question text can be reiifcd 
upon to elaborate on the meaning of the variable label, the value description • 
expands bh and clarifies the meaning of the value label. 
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10. Value Descriptions .• ., 

Response descrlptiqps are complete descriptions of the response codes 
and are only used when the response labels do not adequately convey, the 
meaning of the response codes. If applicable, each of these descriptions 
includes the actua l i nformation as it appears on the d a t a collection form, 

-11. Notes 

The "notes" section of codebook items has many uses. Generally, ..it 
is used to provide the researcher with important additional information 
about an item. It can tell which of a group of respondents answered a particular 
question and. under what circumstances. It can include interviewer 1 s in- < > 
structionsj explanations of strange response cod^s, and descriptions of 
unusual frequency distributions. In addition, it may refer the researcher 
to other clbsely-^reiated questions or to information in an appendix which 
may be of interest. ' ; 

In preparing the codebooks for the Compensatory Education Archive and . 
the Education Voucher Demonstration Archive., it became apparent that a number 
of notes were used over and over again. w£ will describe some of these 
notes and their uses here, first , to provide the archivist with some ready- 
made notes, and, second, to illustrate the types of ^function notes can 1 serve. 

One of these functions 0 is to tell who answered; the question and why. 
To tell who, we used a "universe" notes, with the following format. 

DNV: Q6=2 

This note was part of the codebook description of question 7. The 

• _ ^ . __ _ _ ■ ' ; t 1 ' „ ' 

"equation" means that ail the respondents ; to question 7 responded to question 
6 with an answer that had a value of 2, The note appended to question 6 . 
read: "if 2, gdtto QZ; if 1, 3, , or 5, skip to Q8." 
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' ^Sometimes, there ar?er several sequences drfferlrt follow- •^•-•"^ 

td arrive at this : same question. . In this Sase, the note reads: ':. 

UNV: Q9=2/Q11 = 1/Q12=1 C/=*6r*J ' ^f-- - : 'X v"^-V. 

' Or, there may be more than one condition that must be met before a 

respondent answers a question: 

'UNV: QM=1 & Q46=3 _ . ■ . : ' ; , : ■ • . ■■ ■ ~ 

Interviewer's instructions include any information that the^ interviewer 

did not read to the respondent, such as; 

- * * ■ •. * 

"The interviewer handed the respondent card E to facilitate answering 
this question." 

O ' - * 

"The interviewer did not read the value 3 response. It was coded only 
if the respondent volunteered this information." 

"If the respondent answered no,. the interviewer circled A on the Instrument 
and skipped to Q8." 

"The interviewer was directed to look. on page 4 for the name of the 
4 child referred to in question 80." « * 

. "The probe, What others? was used with this question and asWed only 
once." - / * . 

To .help the researcher locate related items and information, notes "f^ | 

similar to the following could be used:, v . x 

"See Reference #106 for respondent's past' experience in this field. 

"The child to which this series^ of questions refers . is . the "Kish Kid," 
randomly selected by a method devised by Leslie Kish and described . ■ ■; 
X in his book, Survey S ampling . 

"A list or values and their meanings appears in the appendix." 
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: . , •' V = >" ' ill. .SUPPLEMENTAL INFORMATION • \' , "\ - ; . ; ■ 

. BIBLIOGRAPHY ; - * : " . : ' 

Jlf ^i^raptiy^ists reports based on the data in the file. ItlRay 
^o^-repprts on'the; same subject based on other data and background ' " 
'*Jitf^N^;W^ exkct information on where to obtain ' 



.copies of. unpublished reports; .Reports based on data' fro- rore than one , ; 
.. |ile within a study or subsidy will not normally be included here. Such 
repots should be listed in the .study^or substudy bibliographies described .:' 

in Volume III: Project-Level Dociimentation. ' > ■ ' ' 

' •• ..... _ . * "v ' • ' • * '. •-, ./^:' ; : . 

B. . FILE HISTORY ' • • " ' . .. "• * . - ; 

This section identifies the individuals and organizations responsible 
for various aspects and -sections bf the data ■file*. .'"Chronologically ordered, 
it contains the name of each person or prganizatip^involved^ describes 
hi.s/her role in the creation of the data file, and" tlje datfes of : H3!^Si®t^^:- 
activities. 'Minimally, It. identifies* the ;data .collection cbntr^t^^ ^analysis, 
Contractor, data management contractor, and othe J r individuals or organizations-, 
who-: are involved with ,the~ data or have subjected tftem to significant analyses. 

C, APPEPICES ^ ', ] V ~ :: . ' * ' ' ^ ' ^^^.'l::^-:^: 

r J/* .Original Project Documehts ' • •• . V . ; ' ' •:^ ;> >.' ; 

Copies: of .original data .collection forms, codfb^^ trjiicti ons , and ' " 
other documents . comprise this appendix- i " -K^®^ 

• : • • . ■ \ . ..... •„'.•.••!.••••. :... ... . » ..• 



2 . t Data Editing 



.. . 'x:' >y 



: ; Jf ; the data set were extensively edited • and "cleaned , a -copy -of th^ • 
cleaning specif i<^tions appears in this appendix. If a fof^iireport on 
the'tsleanirig prbce|s_were ! issued, a starry of its iajor findings is also 
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